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FAULT DETECTION AND ISOLATION IN AN OPTICAL NETWORK 

Related Applications 
This application claims the benefit of U.S. provisional application serial number 
60/152,138, "Fault Detection And Isolation In An Optical Network," by inventors Kevin 
Hester, Todd Beine, Rainer Iraschko, and Kent Rhyorchuk, filed August 24, 1999, the 
entire contents of which are hereby incorporated by reference. 

Background of the Invention 

1. Field of the Invention 

The present invention relates generally to fault detection and fault isolation in an 
optical network. 

2. Description of Background Art 

Optical data networks typically include a plurality of nodes linked by optical 
fibers into a network. The network may be one of several common topologies, such as a 
linear chain network, an optical star network, or an optical ring network. Optical 
networks are also classified by the geographic size of the network, with wide area 
networks (WANs) and metropolitan area networks (MANs) being of increasing interest 
for providing high bandwidth network data links to and from corporations and LAN 
campuses, for example. 
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A popular optical network topology for MANs is an optical ring. As shown in 
FIG. 1, an optical ring network 100 typically comprises a sequence of network nodes 
105, at least one primary optical fiber path 110, commonly known as "working fiber" 
coupling data between the nodes. Optical networks transport large flows of information 
5 such that system outages of even a few seconds can cause the loss of huge quantities of 
information. This is especially true for wavelength division multiplexing (WDM) and 
dense wavelength division multiplexing (DWDM) optical networks, which 
simultaneously transmit data in a plurality of optical channels, with each channel 
comprising a different optical wavelength. 

10 The reliability of an optical network is an important design consideration. Optical 

networks can fail due to several different mechanisms. Line failure is commonly defined 
as a fault in the ability of light to be transmitted between nodes along a working fiber, 
i.e., there is no light coupled into the node because of damage to the optical fiber. 
Additionally, a line failure can occur at or near the interface of the fiber and a node. For 

15 example, the optical fiber may not be properly inserted into the node. Additionally, a 
failure of an optical interface element may be optically equivalent to a line fault if it 
results in a total loss of signal at all channel wavelengths to all downstream components. 
For example, a fault in the optical interface element receiving signals from the fiber that 
results in a complete loss of signal to all subsequent optical elements within the node is 

20 equivalent in effect to a fault in the fiber. An electrical equipment failure is commonly 
defined as a failure in one or more electrical or electro-optic modules in the node. These 
include optical amplifiers, multiplixors/demultiplexors, transponders, and other elements 
used to amplify, frequency shift, or to add or drop individual channels or bands. An 
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electrical equipment failure may result in a loss in all channels, but may more commonly 
result in only a limited number of channels being dropped. 

Optical networks typically employ several different approaches to permit network 
service to be rapidly restored in the event of a fault. Referring again to FIG. 1, optical 
ring networks typically include at least one protection fiber 115 between each node 105. 
The protection fiber 115 provides an alternate path for optical data in case the primary 
optical fiber 110 becomes broken or damaged along a portion of its length. Additionally, 
the protection fiber facilitates the routing of data to bypass a defective node 105 via a 
path in the protection fiber. In the case of a unidirectional path-switched ring (UPSR) 
the working fiber and the protection fiber commonly carry information in opposite 
directions, e.g., data is commonly transmitted in the working fiber in a clockwise 
direction and in the protection fiber in the counter-clockwise direction. Bidirectional path 
switched rings (BPSR) permit traffic along the ring to be carried in both directions via 
two or more working fibers and two or more protection fibers. 

FIG. 2 is an illustrative diagram of a UPSR ring 200 operating with working and 
protection fiber path links intact. For the purposes of illustration, an optical data path is 
shown between the tributary interfaces of two nodes, NE1 and NE2. As shown in FIG. 3, 
in the event of a fiber break the working traffic is switched to the protection fiber in order 
to maintain the data link between the tributary interfaces of nodes NE1 and NE2. This 
is performed using optical line switching elements (not shown in FIGS. 2 and 3) within a 
node in order to optically switch the path of the optical signals. Note that a complete 
failure of one or more electrical elements within node NE1 or NE2 could also break the 
flow of data. Consequently, nodes NE1 and NE2 typically include redundant electrical 
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and electro-optic elements that can be switched into use in the event that one or more 
electrical elements in the node fails. This is commonly known as equipment switching. 

As shown in FIG. 1, a network management system (NMS) 120 is typically used 
to regulate the action of the nodes 105 in the event of a line failure or an equipment 
failure in order to restore network service. The NMS 120 typically comprises a central 
workstation computer receiving electrical signals corresponding to the optical strength of 
every optical channel transmitted through each active line of each node. The NMS 120 is 
typically programmed with a list of rules or procedures for handling different types of 
failures. Multi-channel optical-to-electrical-to-optical (OEO) detectors (not shown in 
FIGS. 1-3) in each node can be used to measure the signal strength of each channel 
entering or leaving the node. This permits the NMS 120 to determine if a channel has 
been dropped. If the NMS 120 determines that a channel has been dropped in a particular 
node, the NMS may instruct the node to perform an equipment switch of a component in 
the path of the dropped channel likely to have failed. The NMS 120 monitors the activity 
of all of the nodes, determines if a change in traffic occurs, makes a decision whether a 
line fault of equipment fault has occurred, isolates the fault to a particular node or fiber 
path, and issues appropriate commands to all of the nodes to perform one or more 
equipment switches or line switches to restore network traffic. 

While the network management system shown in FIG. 1 improves the reliability 
of network 100, the inventors of the present application have recognized that it has 
several substantial drawbacks, particularly in regards to high performance metropolitan 
area networks. First, it can take a significant length of time for a central computer of a 
NMS 120 to determine an appropriate course of action due to the cumulative time delays 
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of the system. There are finite response times for each OEO to measure the signal 
strengths of each optical channel to determine if a channel is dropped. There is also a 
significant propagation time for channel status signals to reach the central computer of 
NMS 120. This propagation time includes the time delay for short-haul Ethernet cables 
coupled to the node along with the time delays of the long-haul data link (e.g., a 
telephone line) to the central computer, which may be located several kilometers away 
from an individual node in network 100. There is also a time period required for the 
central computer to assess the state of each node and to make a decision. Still yet another 
time period is associated with the time delay required to transmit control signals from the 
central computer of NMS 120 back to each node via Ethernet and long-haul connections. 
There is also a time delay associated with the circuitry at the node that is used to 
implement a line switch or an equipment switch. In a conventional MAN system 100 the 
total elapsed time between the detection of a failure and a line switch or equipment 
switch being implemented can exceed 0.1 seconds. One industry standard that has 
evolved is that a communication disruption lasting more than 50 milliseconds constitutes 
a network outage, i.e., tributary networks receiving and transmitting data via network 100 
are designed upon the assumption that optical network 100 has outages of less than 50 
milliseconds. Network outages in excess of 0.1 seconds may therefore cause an 
irreparable loss of data to a tributary network. 

Another drawback of network 100 is that the NMS 120 can be comparatively 
expensive to implement. The central computer is often implemented as a high 
performance work station, which is comparatively expensive. Another substantial cost is 
associated with the OEO modules used to measure channel strength in each node 105. 
OEO modules increase with cost as a function of the number of optical channels that they 
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are capable of analyzing. Additionally, the cost of each OEO module tends to increase 
with the data capacity of each channel since faster optical and electrical components are 
required for high data rate channels. Advances in DWDM technology now permit thirty 
or more high data-rate channels to be implemented in an OEO module. This results in a 
5 corresponding increase in the cost of the OEO modules compared with first generation 
WDM designs having three to five moderate data-rate channels. 

Another drawback of network 100 is that it may provide insufficient information 
to isolate electrical equipment failures for later repair. The increase in the number of 
channels in DWDM systems has led to multistage node designs having several stages. 
10 The stages commonly include various combinations of band pass filters, channel filters, 



wavelength shifters, optical amplifiers, multiplexors, and demultiplexers . Each stage, in 
turn, may host one channel, several channels, most of the channels, all of the channels, or 
frequency shifted versions of the channels, depending upon the function of the stage. A 
single OEO module is typically insufficient to determine the element within a node that 



15 has dropped a channel. Consequently, several OEO modules may be required for fault 
isolation, further increasing the cost of the system. 

Therefore, there is a need for an improved system and method for performing 
fault detection, isolation, and network restoration in an optical network. 



and method for optical networks. One aspect of the present invention is that a local 
controller in each node makes decisions on whether to activate fault restoration elements 
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Summary of the Invention 
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The present invention generally comprises a fault detection and isolation system 
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within the node, eliminating the need for a central computer to coordinate the actions of 
each node in response to a fault. 

Generally speaking, an embodiment of an optical node of the present invention 
generally comprises at least one fault restoration element for restoring network traffic in 
5 response to a fault; at least one optical sensor coupled to the node for measuring a first set 
of optical characteristics of the optical channels coupled to the node; a signal sensor for 
receiving data from another device corresponding to a second set of optical 
characteristics of the optical channels; and a controller for adjusting the operation of said 
at least one restoration element as a function of said first and said second set of optical 

10 characteristics, whereby said controller determines a network fault requiring local action 
and directs said at least one restoration element to perform a restoration instance. In 
preferred embodiments the fault restoration element may include a line switcher, a 
redundant electrical or electro-optic element, or a combination of a line switcher and 
redundant electrical or electro-optic element. In one embodiment the controller is a 

15 microprocessor having a software program residing on the microprocessor that includes a 
problem list for correlating the occurrence of potential faults from the first and second set 
of optical characteristics. In a preferred embodiment the signal sensor is a transceiver for 
communicating channel status messages with a neighboring node via an optical channel. 



Broadly speaking, the present invention also includes methods of fault detection 
20 and isolation. One embodiment is for an optical network having a plurality of optical 
nodes, each node including at least one local optical sensor for measuring optical 
characteristics of the datastream at the local node, at least one transceiver for 
communicating data to each neighboring node that it is coupled to via a fiber optic link, 
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and each node having a local controller for controlling at least one local restoration 
element, the method comprising the steps of: sensing a set of optical characteristics of the 
datastream at each node; updating a channel map of active channels at each node of the 
optical network; and communicating the updated channel map to the nodes via the fiber 
5 optic link; wherein each local controller compares the optical characteristics measured at 
the local node to the channel map to determine if a fault has occurred requiring that the 
local controller activate a restoration element. 

The method of the present invention may be practiced with nodes having fault 
restoration elements that include line switchers, redundant electrical or electrro-optic 

10 elements, and combinations thereof In one embodiment the method of fault detection 
and isolation is for a node of an optical network having a datastream with a plurality of 
optical channels, the network including a plurality of nodes coupled to each neighboring 
node, each node having at least one local optical sensor, each node having at least one 
optical transceiver for communicating status reports to each neighboring node that it is 

15 optically coupled to, and each node having a local controller for controlling a local line 
switcher residing in the node, the method comprising the steps of: sensing a loss in signal 
from a neighboring node via the local optical sensor; monitoring the transceiver to 
determine if the neighboring node is communicating status reports to the node; and 
initiating a line switch to redirect traffic to an alternate optical path to restore data traffic 

20 if there is a loss in signal from the neighboring node and status reports are not being 
receiving from the neighboring node. In another embodiment the method of fault 
detection and isolation is for a node of an optical network having an optical datastream 
with a plurality of channels, the network including a plurality of nodes optically coupled 
to each neighboring node, each node having at least one local optical sensor, at least one 

25 transceiver for communicating data to each neighboring node that it is coupled to, and a 
local controller for controlling redundant elements residing in the node, the method 
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comprising the steps of: sensing a first set of optical characteristics of the optical 
channels traversing the node; receiving status reports that include a second set of optical 
characteristics of the optical channels measured by at least one sensor in another node of 
the network; comparing the first and second set of optical characteristics; determining if 
5 one or more optical channels are being dropped in the node; and initiating an equipment 
switch in the local node to restore the dropped traffic. 



Brief Description of the Drawings 

FIG. 1 is a block diagram of a prior art optical network having a central network 
10 management system for detecting faults and restoring network traffic. 

FIG. 2 is a block diagram of a prior art optical ring network having a central 
network management system controlling traffic between working fibers and protection 
fibers. 

FIG. 3 is a prior art block diagram of the network of FIG. 2 showing how the 
15 central network management system redirects traffic in response to a line fault. 

FIG. 4 is a block diagram of an embodiment of optical network in accordance 
with the present invention. 

FIG. 5 is a block diagram of an embodiment of an optical network in accordance 
with the present invention. 
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FIG. 6 is a block diagram showing a preferred arrangement of 
multiplexing/demultiplexing elements within the nodes of a wavelength division 
multiplexing optical network in accordance with the present invention. 

FIG. 7 is a functional block diagram showing a preferred node arrangement in 
accordance with the present invention. 

FIG. 8 is a block diagram of a preferred embodiment of the node of FIG. 7. 

FIG. 9 shows an embodiment having node components arragned on field 
replaceable circuit packs communicatively coupled to processor and memory modules. 

FIGS. 10A and 10 B are portions of an exemplary decision table used to detect 
and isolate faults in a ring network. 

FIG. 1 1 is a functional block diagram of a fault detection and isolation controller 
system in accordance with the present invention. 

FIG. 12 is an interaction diagram showing a preferred sequence of steps for 
initiating an equipment switch instance. 

FIG. 13 is an interaction diagram showing a preferred sequence of steps for 
initiating a line switch instance. 

Detailed Description of the Preferred Embodiments 

The figures depict a preferred embodiment of the present invention for purposes 
of illustration only. One of skill in the art will readily recognize from the following 
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discussion that alternative embodiments of the structures and methods disclosed herein 
may be employed without departing from the principles of the claimed invention. 

FIG. 4 is a block diagram of a portion of an optical network 400 illustrating some 
of the general principles of the present invention. For the purposes of illustration many 
conventional elements used in optical networks are omitted. It will also be understood 
that optical network 400 may be part of a larger chain, branched chain, mesh, or ring 
network. A plurality of network optical nodes 405 are shown coupled by fiber spans 410. 
The fiber spans 410 may comprise one or more optical fiber lines that include all of the 
potential optical channel data links between neighboring nodes for communicating an 
optical datastream. For the purposes of illustration, the optical paths in the fiber lines are 
shown by arrows, although it will be understood that the drawing is not to scale and that 
each arrow corresponds to at least one fiber coupled to corresponding optical ports 460 at 
each node. As described below in more detail, in a preferred embodiment node 405 has a 
plurality of ports arranged to that a line switcher may be used to redirect the optical data 
stream to an alternate output port in the even of a line fault. For example a fiber span 410 
may comprise a working fiber line and a protection fiber line each coupled to a node by 
respective ports. In a preferred embodiment for wavelength division multiplexing, each 
fiber line comprises a single spatial-mode optical fiber capable of transmitting a plurality 
of optical wavelengths. 

Each node 405 includes its own local restoration elements 420 to restore network 
service in the event of a fault that could disrupt network traffic. The restoration elements 
420 preferably include a line switcher responsive to a line switch command to re-direct 
traffic away from a faulty line of a span 410 and onto an alternate line by selecting an 
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alternate optical pathway between two ports of the node, i.e. changing the port from 
which the data stream exits the node such that the datastream is redirected onto an 
alternate line. The restoration elements also preferably include redundant electrical or 
electro-optic elements response to an equipment switch command to maintain network 
5 traffic in the event of a failure of an electrical or electro-optic component. Examples of 
redundant electro-optic elements include but are not limited to: redundant band pass 
filters, redundant channel filters, redundant multiplexors, redundant demultiplexers, 
redundant optical detectors, redundant optical amplifiers, or redundant transponders. It 
will be understood that while there is at least one fault restoration element 420 that 
10 various combinations of fault restorations elements may be included depending upon the 
particular application. Each node has its own local controller 430 that determines if a 
fault has occurred and that regulates the actions of the restoration elements 420 within the 
node. 

An individual controller 430 may use several different sources of information to 
15 make a decision whether to activate the restoration elements 420 of its node. First, each 
node 405 includes at least one internal optical sensor 418 for measuring a first set of 
optical characteristics of the optical channels at the node. The first set of optical 
characteristics corresponds to information on channel activity that may be determined by 
the optical sensors at the local node, such as information regarding a complete loss of 
20 signal (LOS) in a single channel, a band of channels, or all the channels, depending upon 
the resolution of the sensor and the number of channels received by the sensor. The 
optical sensor 418 may be part of a pre-amplification or post-amplification component of 
the node. Additionally, each node 405 may also include other internal elements, such as 
p-I-n photodiodes, configured to provide information on the signal strength of individual 
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channels or bands of channels. P-I-N photodiodes are a type of photodetector that has a 
lightly doped intrinsic semiconductor sandwiched between p-type and n-type 
semiconductors. However, optical sensors, such as p-I-n detectors, that are not otherwise 
required to perform a necessary node function (e.g., multiplexing or demultiplexing) 
increase the cost of network 400 and introduce extra signal loss that must be compensated 
for with optical amplifiers. Consequently, it is desirable to use optical sensors required 
for other functions to provide information on the characteristics of the optical channels. 
For example, some types of transponders include an inherent sensing capability that may 
be used to provide information on the characteristics of the optical channels (e.g., the 
presence of optical power in a channel received by the transponder). The required 
resolution of an optical sensor 418 depends upon whether the sensor is coupled to a 
single channel, band of channels (e.g., three channels), or to all of the channels. 
Conventional p-I-n optical detectors commonly have sufficient resolution to measure the 
loss of a single channel from a band having a small number of channels (e.g., a band 
having three to five channels), but commonly lack sufficient resolution to determine if a 
single channel has been dropped from a large number of channels, such as when thirty 
optical channels are coupled to a p-I-n detector. 

Second, in a preferred embodiment each node receives information from at least 
one other element, such as an upstream element in a unidirectional path switched 
response (UPSR) optical ring. This information from the upstream element assists the 
node to determine if individual channels or bands of channels have been dropped (lost) 
prior to entering the node. In a preferred embodiment, each node receives status 
information from upstream nodes from an optical supervisory channel (OSC) 
communicated by the optical span. This status information preferably includes the 
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information that at least one upstream controller has measured or otherwise collected 
regarding the status of the network channels. The status information may include a 
second set of characteristics for the channels upstream of the node along with other 
network status information. The second set of optical characteristics may, for example, 
include information on channel activity measured at the upstream node. Additionally, the 
second set of optical characteristics may also include a channel map of channel activity of 
a plurality of nodes. For example, referring to FIG. 4, node 2 may record a set of 
characteristics for the optical channels traversing node 2 using its own optical sensors and 
transmit this information to node 1 via the OSC Node 2 may, for example, communicate 
to node 1 that it is receiving optical signals (e.g., from node 3) or that some of its 
transponders or other optical detectors have experienced a loss of power in one or more 
channels or bands of channels. Additionally, node 2 may relay to node 1 messages that 
include network status information that is has received from other nodes. For example, 
the messages transmitted via the OSC channel may, for example, include a channel map 
of active channels that is updated and forwarded at each node as it is passed along to 
downstream nodes. Additionally, the OSC channel may be used by a node to transmit 
status and alert messages that are also forwarded to other nodes of the network. 

The OSC channel also permits the nodes to communicate messages on planned 
restoration events to each other, thereby permitting the nodes to coordinate their actions. 
For example, node 2 may send a signal announcing to its neighbors that it is about to 
initiate an equipment switch that will cause a short disruption in traffic through node 2. 
This information will alert node 1 to not interpret a short disruption in upstream traffic as 
a line fault. Additionally, the OSC channel permits the nodes to share information on the 
results of a restoration event. For example, if node 2 initiates an equipment switch that is 
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unsuccessful in restoring network traffic then node 2 can communicate this information 
to the other nodes as a failed restoration instance message. This may provide information 
to another node that enables that node to determine if an equipment switch is necessary at 
that node. This so-called "rolling equipment switch" mode of restoration is made 
possible by each node sharing data on the results of an equipment switch with its 
neighbors. The OSC channel thus provides a means for the nodes to communicate a 
variety of status information, including channel status and coordination information. 

In an alternate embodiment, an optical analyzer 440, such as an optical spectrum 
analyzer (OSA) may be used to measure information on the channel activity and 
communicate it to a node 405. This information may include information on the number 
of optical channels transmitted along the fiber to the optical analyzer. Additionally, other 
information may also be recorded, such as information on channel width (Hz) or other 
information indicative of a failure of a laser, amplifier, or electro-optic component in a 
network node. As shown in FIG. 4, the OSA 440 may be electrically coupled via 
electrical connections 445 to a signal sensor 447 of a downstream node if the OSA is 
disposed a comparatively short distance from the node. However, if the OSA is located a 
substantial distance upstream of the node it may be desirable to communicatively couple 
the OSA to the downstream node using an optical data channel of the fiber span, i.e. an 
optical sensor 418 is used to perform the function of the signal sensor. 

One advantage of optical network 400 is that each node 405 benefits from the 
optical channel information of other devices, particularly upstream nodes. This means 
that an individual node 405 requires comparatively fewer internal optical sensors (e.g., P- 
I-N photodetectors) in order to have the capability to detect and isolate an equipment 
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fault. Additionally, each individual node 405 receives information from its neighboring 
nodes that allows it to determine if a line fault has occurred between it and its 
neighboring node, thus enabling the node to make a decision whether a line switch is 
appropriate. 

Another benefit of network 400 is that it eliminates the need for centralized 
control of network restoration. Node controller 430 may be implemented as a 
comparatively low cost control module. For example, node controller 430 may be a local 
computer, microcomputer, microprocessor, microcontroller, or dedicated circuit 
configured to control the actions of the restoration elements within the node that it 
resides. In a preferred embodiment, node controller 430 is a comparatively inexpensive 
microprocessor programmed to act as a local computer that controls the actions of the 
restoration elements 420 within the node that it resides. 

The use of a local controller 430 at each optical node 405 results in fast response 
times compared to a conventional centralized NMS system 100. There are currently two 
important restoration time standards in the industry for a maximum acceptable 
interruption in network traffic. These restoration time standards correspond to a fifty 
millisecond standard and also a one hundred millisecond standard developed by 
BELLCORE. Estimates by the inventors indicate that a metropolitan area network 
having a ring topology constructed in accord with the teachings of the present invention 
can restore line faults in a time period less than either common industry standard, i.e., 
less than either one hundred milliseconds or fifty milliseconds. This is in contrast to a 
conventional optical network 100 having a central NMS 120, which may require up to 
one second to restore traffic after a line failure in a metropolitan area network. 



16 



2121 6/05036/DOCS/l 075497.5 




PATENT 



FIG. 5 is block diagram of a generalized representation of two neighboring nodes 
of an optical network 500 coupled by an optical span 510. As shown in FIG. 5, an optical 
signal sensor 520 (e.g., a P-I-N photodetector) is used to convert the OSC optical channel 
into an electrical signal (i.e., optical to electrical conversion of the OSC channel). The 
OSC channel is preferably tapped off using an optical filter or demultiplexer disposed in 
a line card element. A microprocessor controller 530 is preferably used to host a fault 
detection and isolation computer program at each node. The fault detection and isolation 
program correlates available information on optical channel characteristics in the node 
with status information received from neighboring nodes to detect changes in channel 
activity indicative of a line fault or equipment fault in the node. The program preferably 
includes a decision table or algorithm to determine if a line switch or equipment switch is 
required. As shown in FIG. 5, a line switcher 525 is preferably included to redirect 
optical traffic to the port of an alternate optical line in the event of a line failure. A 
wavelength division multiplexing (WDM) optical network typically includes electro- 
optic elements 530 such as multiplexors, demultiplexers, and optical filters. A set of 
redundant electro-optical elements 535, R, is preferably also included to permit an 
equipment switch. Each node may also include a transponder 540 to convert the channels 
into an optical frequency or format appropriate for a tributary network coupled to the 
node. In some cases the transponder includes optical power level capability, i.e., the 
frequency or format conversion performed by the transponder inherently involves 
determining that the channel is active. FIG. 6 is a block diagram illustrating a preferred 
arrangement of band filters, channel filters, and wavelength conversion transponders for 
an optical ring network. For the purposes of illustration, the line switcher, controller, and 
redundant elements are not shown in FIG. 6. 
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FIG. 7 is a block diagram of a preferred node structure 700 for an optical ring 
network. The functional attributes of each block are also shown. It will be understood 
that many variations on the structural and functional relationship of node 700 are 
encompassed by the present invention. 

5 As shown in FIG. 7, in a preferred node structure 700 a control section 710 

includes a configuration database, operational interfaces, and software modules for inter- 
node communications via OSC channels. The common control section provides software 
administration and control of the node and preferably includes a PC compatible processor 
element module and persistent storage module. The processor element module preferably 
1 0 runs an Embedded WINDOWS NT operating system. 

The transport section 720 includes elements for line amplification, a line switcher, 
measuring the OSC signal, and transmitting or terminating (if desired) a line. The 
transport section 720 is configured to receive four ports corresponding to two West-line 
ports and two East-line ports, in accord with standard terminology in UPSR and BLSR 

15 rings that a first span comprising a working fiber and a protection fiber are coupled via 
two ports 722, 724 to one side of transport section 720 whereas a second span including 
another working fiber and protection fiber are coupled via ports 726, 728 to the other side 
of transport section 720. The transport section terminates the spans between nodes. It 
divides the received optical signal into working, protection, and un-switched band groups 

20 and provides protection switching in BLSR and UPSR systems. It preferably includes an 
optical amplifier such as an Erbium doped fiber amplifier to boost optical DWDM 
transmission levels. 
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The multiplex section 730 includes electro-optic elements for performing 
wavelength division multiplexing operations and has redundant elements for equipment 
protection. The multiplex section 730 is preferably implemented as a two-stage optical 
multiplexor that aggregates signals from the tributary section 740 into the line side 
DWDM format and splits received line-side signals into the individual channels used by 
the tributary section. The first multiplex stage combines the individual channels 
launched by the tributary transponder wavelength converter interfaces (WCIs) into three- 
channel wide bands. Each band is preferably fed into the aggregated line-side signal by a 
band wavelength division multiplexor (BWDM) in the second stage. The BWDMs 
multiplex (add) and demultiplex (drop) groups of three wavelengths for further 
subdivision by channel wavelength division multiplexors (CWDM)s. The CWDM 
modules multiplex a set of three outgoing wavelengths into a band and demultiplex the 
corresponding incoming band of wavelengths into three individual channels. CWDMs 
also switch traffic between the primary BWDM CP backup BWDM modules in 
redundant applications. In a protection switching configuration, a signal carried on a 
working band is restored by switching it to the equivalent protect band on the other fiber. 

The tributary section 740 includes transponders (WCIs) for wavelength 
translation, tributary network ports 745, and has add/drop signal routing capability. The 
tributary section is the point of connection for client optical signals and it converts the 
line side DWDM frequencies into short reach signals, such as 850 nm and 1310 nm 
signals. Each WCI of the tributary section has two transmitter-receiver pairs: one for the 
client-side signals and the other facing the multiplex section and operating a specific 
DWDM frequency. Between these is an electrical cross-connect capable of routing 
signals between the tributary and line side transceivers. 
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FIG. 8 shows a block diagram of a portion of preferred embodiment of a WDM 
node for optical rings. Referring to both FIGS. 7 and 8, the transport section generally 
comprises line cards 805, optical pre-amps 810 and post amps 815, working protect 
splitter (WPS) section 820, ring switch module (RSM) 825 for performing a line switch. 
5 The WPS section 820 preferably is configured to permit working and protect traffic to be 
demultiplexed. During normal operation (i.e., all fiber lines functional) it is desirable to 
have the option to transmit data, such as lower priority data, on the protection fibers (i.e., 
use both the working and protect fibers). WPS section 820 permits data on both working 
and protection lines to be demultiplexed during normal operation. Additionally, optical 

10 switches are also shown in the WPS section 820 for switching light via the RSM 825. 
The multiplex section generally comprises band wavelength division multiplexing 
(BWDM) sections 830 and channel wavelength multiplex division multiplexing 
(CWDM) sections 835. The transponders 840 (XPND), also known as wavelength 
converter interfaces (WCIs), correspond to a portion of the tributary sections. There are 

15 redundant BWDM sections 845. The overlapping BWDM and XPND sections illustrate 
redundant components. Optical switches are included in the CWDM modules to permit a 
switch to the redundant BWDM modules. 

A preferred arrangement of optical detectors is also shown. The pre-amp and 
post-amplification elements include an inherent ability to determine if they are receiving 
20 optical power. Similarly, the XPND units include an inherent optical signal detection 
capability. Optical detectors are preferably also included in the WPS and CWDM 
modules. In the preferred embodiment the BWDM modules do not include optical 
detectors. The transceiver 850 for receiving OSC signals also includes an optical sensor 
capability. 
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A node for dense wavelength division multiplexing (DWDM) may carry a large 
number of different wavelengths, with each wavelength corresponding to one channel. 
Consequently, the physical implementation of a DWDM node 400, 500, 700 800 may 
require a plurality of CWDMs, BWDMs and XPNDs. The components are preferably 
arranged on a plurality of field-replaceable circuit packs coupled by short optical jumper 
links to facilitate a field engineer making rapid field repairs of failed components by 
swapping circuit packs. FIG. 9 is a block diagram of a partial view of node with field 
replaceable circuit packs showing a plurality of circuit packs 910 coupled by a local 
Ethernet connection to at least one OSC/COM module 920. The OSC/COM 920 
module contains circuitry for inter-node communication via the OSC channel. The OSC 
channel may communicate data in a variety of data formats used in digital networks, such 
as TCP/IP, Ethernet, or ATM format. The OSC links may be configured to form a 
neighboring node data link. Alternately, data packets may contain address information 
(e.g., data frames) for transferring data further along the network. Additionally, the 
OSC/COM module 920 preferably includes circuitry for coordinating the communication 
of each circuit pack 910 to an administrative complex 930 corresponding to a 
microprocessor 940 and memory storage module 950. The OSC/COM module also 
preferably receives signals from each circuit pack indicative of problems with the circuit 
pack, such as a loss of electrical power to the circuit pack or internal self-diagnostic 
signals from the circuit pack (e.g., an abnormal electrical characteristic indicative of a 
failed component or a signal form an optical detector residing on the circuit pack). 

It will be understood that a software program for performing fault detection and 
isolation may have slightly different algorithms depending upon the arrangement of 
optical elements within each node, the number and arrangement of optical sensors for 
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measuring the optical characteristics of the channels within a node, and the number of 
elements, such as transponders, capable of providing an output indicative of channel 
activity. In the most general case a network engineer would analyze the node design and 
the network topology to produce a "problem list" of likely problems for each signal 
indicating a possible loss of a signal channel, plurality of channels, loss of a band, loss of 
a plurality of bands, or loss of all channels within the node. 

Each problem entry in the problem list may have an associated entry describing 
additional information from other nodes that can be correlated (compared) with the local 
loss of signal in order to isolate the problem to one or more likely causes. However, in 
some cases there may be insufficient information to uniquely isolate the problem to a 
single cause. In this case, the node may attempt a local solution (e.g., a line switch or an 
equipment switch) that is the most likely to restore traffic. If the restoration event does 
not result in a restoration of the dropped channel(s), then this failure may be reported to 
other nodes (e.g., a list of faulty components published) in order to assist those nodes to 
make an appropriate equipment switch or line switch. As an illustrative example, a first 
node may build a list of potentially failed components within the first node based upon 
information from a channel map (a list of active channels at various locations in the 
network) distributed through the OSC and the dropped channel(s) observed by sensors 
coupled to the first node. The first node may then attempt an equipment switch of one or 
more of the components in the its problem list of potentially failed components. If the 
equipment switch does not restore the dropped traffic (i.e., restore the dropped 
channel(s)) the node forwards a summary of the failed equipment switch event that may 
assist other nodes to detect and isolate the problem. For example, the summary of the 
failed equipment switch event may include a list of dropped channel(s), the components 
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in the first node suspected of having failed, and the result of the equipment switch (e.g., 
which of the dropped channels the equipment switch did not restore). An upstream node 
may, in turn, have a problem list that includes an entry in its problem list instructing the 
upstream node to initiate an equipment switch in response to several criteria, including 
5 initiating an equipment switch if it receives a report that the first node noticed a dropped 
channel(s) that an equipment switch in the downstream node did not restore. It will be 
understood that the problem list in the upstream node may also include other criteria for 
initiating an equipment switch used in combination with the summary of the failed 
equipment switch, e.g., the problem list of the upstream node may recommend an 
10 equipment switch be initiated in the upstream node for the situation that the upstream 
node receives the summary of a failed equipment switch from the downstream node and 
one or more other abnormal conditions are detected at the upstream node. 

The problem list can be created for a particular node design and arrangement of 
photodetectors that associates a list of potential problems for various changes in channel 

15 status. For each fault listed in the problem list, there is a set of rules for isolating the 
problem and selecting an appropriate restoration response. This information is preferably 
encoded as part of a fault detection and isolation program residing on the controller. The 
fault detection and isolation program correlates system faults and initiates an appropriate 
restoration action, such as a line switch or an equipment switch. The fault detection and 

20 isolation program preferably includes a wavelength management monitor that maintains a 
list of wavelengths in the network and the status of each channel with respect to source 
and destination nodes. Status change events are preferably communicated to neighboring 
(peer) nodes via the OSC. This permits each node to acquire a map of the network 
topology (e.g., a ring map for a ring network), a channel table of active channels, and to 
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send or receive path change signals (e.g., send or receive information to neighboring 
nodes notifying the neighboring node of a line switch event). Thus, each controller keeps 
a dynamic model of all of the connections to the node, the network topology, the signal 
paths, and the channel status. The fault isolation program also preferably monitors the 
5 optical sensors (e.g, the photodectors in the BWDMs and WPSs), the transponders 
(WCIs), the OSC channel, and any other photodetectors of the node that may provide 
information that can be correlated with a fault. The fault correlation program also 
preferably includes a switch engine that is configured to generate a switch event packet 
that is forwarded to neighboring nodes via the OSC in order to alert neighboring nodes 
10 that a switch event is about to happen. Additionally, the fault correlation program 
preferably notifies other nodes (e.g., neighboring upstream nodes) if a local equipment 
switch has failed, thereby providing the other nodes with status information that can be 
used to make equipment switch decisions. 

Table 1 is an illustrative problem list showing signal loss conditions, their likely 
15 causes, and suggested fault isolation response. As can be seen in Table 1, information 
received from the the OSC channel assists the controller of the local node in making 
equipment switch decisions 



Node 
Condition 


Likely Causes 


Fault Isolation 
Response 


No OSC 
signal Received at 


1) Line Failure To 
Neighboring Node Or 2) 


1) Initiate Line Switch 
If There Is Also No optical 
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OSC detector 


Failure of Local OSC Detector 


power at Input Buffer. 2) 
Institute "OSC detector Failure 
Report" if optical power at 
input buffer. 


No Power 
At Input Buffer 


1) Line Failure between 
upstream neighboring node, or 
2) Line failure upstream of 
neighboring node; 


1) If no OSC signal, 
initiate ring switch; or 2) if 
OSC signal present, do not 
initiate ring switch. 


No Power at 
One Transponder 


1) Failure of 
transponder; 2) dropped 
channel in node; 3) dropped 
channel upstream; or 4) 
channel not presently in 
operation. 


If upstream node 
reports activity in channel 
activate equipment switch to 
elements in node along path of 
transponder. If problem not 
solved, report failure to other 
nodes. 


No Power of 
all CWDMs Linked 
to a Common 
BWDM 


1) Failure of BWDM; 
2) failure of multiple CWDMs; 
or 3) loss of channels upstream 


If upstream nodes do 
not report a loss of power in 
affected channels, initiate an 
equipment switch of BWDM. 



Table 1 : Illustrative Problem List For An Optical Network 



It will be understood that the problem list will depend upon the network topology 
and node design. FIGS. 10A and 10B are respective portions of a decision table for a 
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preferred embodiment of the invention for use in an optical ring network. The problem 
list for typical metropolitan area networks requires comparatively little local memory and 
computing power, thus facilitating the use of a local controller 430 that is implemented as 
one or more comparatively low cost microprocessors. 

Referring again to FIG. 8, it will be noted that some problems are more easily 
determined than others, depending upon the path of the optical channels through the node 
relative to the stages in the multiplexor section and on the tributary stage. For example, 
several transponders are typically coupled to one CWDM. Consequently, a loss in 
optical output of all of the transponders 840 coupled to a single CWDM may indicate a 
failure of the CWDM. However, a loss in optical output of only some of the transponders 
coupled to the CWDM indicates that a failure of the CWDM is less likely. Similarly, a 
failure of all of the CWDMs coupled to a BWDM is likely to be caused by a failure of the 
BWDM, although a failure of a BWDM in another node is also a possibility. 

FIG. 1 1 is a functional block diagram of a preferred fault detection and isolation 
system 1100 for implementing the fault detection and isolation function of a local 
controller 430. The system 1 100 preferably includes a fault detector 1 1 10, an equipment 
switch engine 1 120, and a line switch engine 1 130. A path signal manager 1 140 (shown 
in phantom) is preferably coupled to software program 1100. Fault detector 1110, 
equipment switch engine 1120, and line switch engine 1130 are preferably implemented 
as software modules of a common software program residing in controller 430. 
However, it will also be understood that fault detector 1110, equipment switch engine 
1120, and line switch engine 1130 may be implemented on separate electrical 
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components (e.g., separate dedicated circuits or microcontrollers) or reside as software 
modules on separate microprocessors that are electrically coupled together. 

Path signal manager 1140 provides fault detector 1100 with information 
corresponding to a channel map, i.e., an updated list of all channels that are provisioned 
5 at various node locations along the optical network. The path signal manager 1140 is 
preferably implemented as a software module that resides on the local controller 430 and 
which receives channel map updates via the OSC channel. However, it will be 
understood that some of the functionality of path signal manager 1140 may reside in a 
— central monitoring system (not shown in FIG. 11) that accumulates data received from 

m 10 each network node and which publishes channel maps to all of the nodes of the optical 

jp 

O network. 

fy The fault detector 1 1 10 is preferably programmed with information corresponding 

□ to a problem list 1118, such as a problem list similar to those of FIGS. 10A and 10B, for 

m correlating faults and deciding if an equipment switch or a line switch should be initiated, 

y 15 Fault detector 1110 receives one or more input signals, such as an input signal 1112 from 
each transponder (WCI) indicating a loss of signal to the WCI port, one or more input 
signals 1114 corresponding to loss of signal (LOS) from other optical sensors and optical 
detectors coupled to the local node, and an input signal 1116 corresponding to a circuit 
pack failure (e.g., an improper electrical connection to one or more of the circuit packs 
20 shown in FIG. 9). The fault detector also has a signal input 1111 corresponding to 
information received from path signal manager 1140. Fault detector 1110 also 
communicates data on channel activity with path signal manager 1140. Additionally, 
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fault detector 1110 may also have a separate status report output 1119 for broadcasting 
status reports to other nodes. 

The line switch engine 1130 initiates a line switch if it receives trigger signals 
generated by the fault detector 1110 instructing the line switch engine 1130 to perform a 
5 line switch to an alternate optical fiber path. As previously described, the problem list 
1118 of fault detector 1110 is preferably programmed such that the fault detector 1110 
instructs line switch engine 1 130 to initiate a line switch only if there is a loss of signal in 
both the line-card (e.g., the pre-amp to the node) and the OSC channel. Line switch 
engine 1130 preferably has a line switch notification output 1132 that informs interested 

10 subsystems (e.g., the fault detector within the node and systems in other nodes) that a line 
switch is complete. A switch engine 1130 that initiates a line switch in response to a 
trigger signal may be implemented using a variety of conventional line switch engine 
techniques. For example, switch techniques for optical ring networks are described in 
U.S. Pat. No. 5,986,783, "Method and apparatus for operation, protection, and restoration 

15 of heterogeneous optical communication networks," and U.S. Pat. No. 6,046,833, 
"Method and apparatus for operation, protection, and restoration of heterogeneous optical 
networks." The contents of U.S. Pat. Nos. 5,986,783 and 6,046,833 are hereby 
incorporated by reference in their entirety. 

Equipment switch engine 1120 is coupled to fault detector 1110 and initiates an 
20 equipment switch when it receives a trigger signal from fault detector 1110. 
Additionally, equipment switch engine 1120 preferably includes a manual equipment 
switch input signal 1 128. Input signal 1 128 is preferably configured to permit a manual 
equipment switch to be initiated by a field engineer at a local node or by a network 
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administrator. Manual equipment switches are useful to facilitate preventive 
maintenance or upgrades of node components. Equipment switch engine 1120 may also 
include a notification signal 1120 communicated to fault detector 1110 and broadcast to 
other nodes (via the OSC) indicating that the equipment switch is completed. 

FIG. 12 is an interaction diagram 1200 showing a preferred sequence of 
interactions for performing an equipment switch using equipment switch engine 1120. 
Fault detector 1110 instructs equipment switch engine 1120 to perform an equipment 
switch 1205. Equipment switch engine 1120 issues a sequence of switch commands to 
switch to redundant back-up components (e.g., WPS/CWDM switches) using a sequence 
of switch protocols 1215. The equipment switch engine 1110 then notifies the fault 
detector 1110 that an equipment switch has occurred 1220, which prompts fault detector 
1110 to test if the fault was corrected by the equipment switch. If the results of the 
equipment switch are a failure, the results of the equipment switch may be published. 
1230. 

FIG. 13 is an interaction diagram showing a preferred sequence of steps for 
performing a line switch. A line switch causes a short interruption in network traffic. 
Consequently, it is desirable for the fault detector 1110 to perform a sequence of steps 
that are intended to minimize the number of unnecessary line switches. As shown in 
FIG. 13, the fault detector 1110 continuously receives signal inputs related to potential 
faults 1305. If the fault detector 1110 observes an event that may be a line fault the fault 
detector 1110 performs a step 1320 to see if the node is in the process of being equipment 
switched and disables the line switch if an equipment switch in progress. In step 1330 
fault detector 1110 then confirms that there is no received input power and no signal in 
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the OSC channel. In step 1340 fault detector 1110 may also verify that the OSC signal 
detector is not disabled. In step 1350 fault detector 1110 then checks to see if circuit 
pack card is missing. In step 1360 fault detector 1110 may then check to see if the 
protection path is operable. The fault detector may then test to see if the OSC signal has 
5 degraded, as shown in 1370, prior to the step of triggering a line switch 1380. 

For an arbitrary arrangement of optical components in a node 700, 800 the 
channels that are supposed to be coupled through each block element may be labeled, 
permitting, for example, a graph, tree, or table to be prepared showing a list of 
O components in the local node or upstream nodes which could have failed to account for a 

ffj 10 loss of specific channels in the local node. A corresponding list of other local node and 
£ neighbor conditions for each element in the list may also be generated to further limit the 

«1 list of components likely to have failed, based upon the information available from the 

□ nodes and other information that a field technician can later acquire using conventional 

rf% 

fy field analysis techniques. The microprocessor and persistent memory preferably retains a 

iiJ: 

O 15 history of the detected events and response to the fault. This information is preferably 
made available (e.g., via a display) to a field engineer along with a list of components 
likely to have failed and additional action items to confirm which component failed. This 
information can be in the form of a simple audio visual display (e.g., one or more light 
emitting diodes or a liquid crystal display) or may be in the form of a numeric code 
20 accessible by the field engineer. Alternately, the information can be presented in the 
form of a tutorial to guide a field engineer through the steps of an isolation tree. For 
example tutorial could be presented via a monitor (e.g., a liquid crystal display monitor) 
coupled to the local node or via a portable computer coupled to the node. 



30 



2121 6/05036/DOCS/l 075497.5 




PATENT 



Referring again to FIG. 4, in an alternate embodiment each node may receive 
information on channel activity from an OSA 440. The use of OSAs 440 increases the 
cost of the network. However, a benefit of using OSAs is that each node 405 may 
receive accurate information from the OSA on the channel activity upstream or 
5 downstream of the node. A problem list for an embodiment utilizing OSAs 440 would 
include information coupled to a node 440 by one or more OSAs. In particular, a benefit 
of using an OSA 440 is that it permits the channel activity of all of the channels to be 
measured upstream or downstream of a node. This permits an equipment fault to be 
isolated to a particular node, i.e., to the node where a particular channel or band of 
10 channels is dropped. Additionally, the information from an OSA 440 permits the 
effectiveness of an equipment switch to be determined by a node 405. For example, an 
OSA 440 downstream of a node may be used to collect information regarding whether an 
equipment switch instance restored traffic in one or more dropped channels. 

While particular embodiments and applications of the present invention have been 
15 illustrated and described, it is to be understood that the invention is not limited to the 
precise construction and components disclosed herein and that various modifications, 
changes and variations which will be apparent to those skilled in the art may be made in 
the arrangement, operation and details of the method and apparatus of the present 
invention disclosed herein without departing from the spirit and scope of the invention as 
20 defined in the appended claims. 
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